The circular drift-diffusion model (CDDM) is a stochastic sequential sampling model that describes choices and response times observed in tasks with a circular decision space (Smith, 2016). Like many other accumulator models, the CDDM assumes that participants accumulate information in favor of a given response option over time, until a response criteria is met. This is characterized as a random walk that starts at the origin of a circle representing the decision space, and that moves towards its circumference. Once the circumference is reached, the radian corresponding to the point of intersection and the amount of steps it took to get there are indicative of the choice and response times observed.
The CDDM considers four parameters, most of which are illustrated in Fig 1. The parameter not shown is the nondecision time parameter \(\tau\). The remaining parameters are the boundary radius parameter (\(\eta\)) that serves as a response criterion, and a doublet of parameters describing the overall speed and direction of the information accumulation process. These last two parameters can be expressed in cartesian coordinates (i.e., the mean displacement observed on the X and Y coordinates per unit of time \(\{\mu_x, \mu_y\}\)) or polar coordinates (i.e., the average speed and expected direction \(\{\delta,\theta\}\)).
Fig 1. Graphical illustration of the CDDM
In this document, we present a comparison between different sampling
algorithms that generate data under the CDDM. We will describe how each
of these algorithms work and highlight how different they are in terms
of the computation time they require and their accuracy. For starters,
we will generate n = 5000 bivariate observations using the
arbitrary set of parameter values shown below:
Each pair of observations is obtained by emulating the random walk process described by the CDDM. We emulate the information accumulation process by iteratively sampling random movements on the X and Y direction using \(\mu_x\) and \(\mu_y\), until the
par(pty="m", mar = c(3, 3, 3, 0))
hist(X.RW$bivariate.data[,2], main = "Response Times", col = col.RW(0.7))The execution of the Random Walk emulator algorithm took approximately 6.7114 seconds.
Fig.2 Visual representation of the Reject sampling algorithm. The blue lines represent the joint density function of response times and choices prescribed by the CDDM. The dots represent the pairs of candidate value generated from the bivariate space of possible RTs and choices, plotted at the height of the random uniform(0, maxDensity) value used as rejection criterion. If the rejection criterion was lower than the density function (i.e., dot falls below density curve), the sample is accepted; if the rejection value surpasses the density (i.e., dot is drawn above), the candidate is rejected. The process of generating, testing and keeping candidates is repeated until the desired sample size is achieved.
The execution of the Rejection sampling algorithm took approximately 1.8213 seconds.
The following Metropolis algorithm uses the density function to generate random observations under the CDDM. Please read the comments I’ve left through the code to get a better idea on how this algorithm works. In order to work, this algorithm only needs a list par that specifies the values to use for each of the four parameters of the model (in either of its parameterizations, using polar or cartesian coordinates), and an arbitrary value max.RT that indicates the maximum possible R.T we’d expect to observe under a realistic scenario.
The execution of the Metropolis sampling algorithm took approximately 1.2488 seconds.
par(pty="m", mar = c(3, 3, 3, 0))
hist(X.invCDF[,2], main = "Response Times", col = col.invCDF(0.7))The execution of the inverse-CDF algorithm took approximately 7.0074 seconds.
# Load file containing custom eCDF function
source("./code/general_functions/eCDF.R")
RW.eCDF <- myECDF(X.RW$bivariate.data)
Reject.eCDF <- myECDF(X.Reject)
Metro.eCDF <- myECDF(X.Metro)
invCDF.eCDF <- myECDF(X.invCDF)RW.tCDF <- pCDDM(X.RW$bivariate.data, par$drift, par$theta, par$tzero, par$boundary)
Reject.tCDF <- pCDDM(X.Reject, par$drift, par$theta, par$tzero, par$boundary)
Metro.tCDF <- pCDDM(X.Metro, par$drift, par$theta, par$tzero, par$boundary)
invCDF.tCDF <- pCDDM(X.invCDF, par$drift, par$theta, par$tzero, par$boundary)Obtaining the approximate CDF for the data generated with each sampling algorithm took between 1.4566 and 1.5688 seconds.
# We build a simple function to compare these CDFs
getDifferences <- function(eCDF,tCDF){
difference <- tCDF - eCDF
difference.sum <- sum(difference)
KS_statistic <- max(abs(difference)) # Kolmogorov–Smirnov statistic
sq.difference <- sum((difference)^2)
output <- round(cbind(difference.sum,KS_statistic,sq.difference),4)
colnames(output) <- c("sumDiff","KS-statistic","SSDiff")
return(output)
}## sumDiff KS-statistic SSDiff
## [1,] 8.3044 0.0215 0.1574
## sumDiff KS-statistic SSDiff
## [1,] -10.4874 0.0171 0.114
## sumDiff KS-statistic SSDiff
## [1,] 10.8075 0.0253 0.3361
## sumDiff KS-statistic SSDiff
## [1,] 1462.527 0.9999 548.3261
compareTime <- function(n.Samples, n.Datasets, par){
times.RW <- matrix(NA, nrow=n.Datasets, ncol=length(n.Samples))
times.Reject <- matrix(NA, nrow=n.Datasets, ncol=length(n.Samples))
times.Metro <- matrix(NA, nrow=n.Datasets, ncol=length(n.Samples))
times.invCDF <- matrix(NA, nrow=n.Datasets, ncol=length(n.Samples))
seed <- 1
for(m in 1:n.Datasets){
set.seed(seed)
i <- 1
for(n in n.Samples){
start_time <- Sys.time()
x <- sample.RW.cddm(n,par)
end_time <- Sys.time()
times.RW[m,i] <- round(end_time-start_time,4)
start_time <- Sys.time()
max.RT <- max(x$bivariate.data[,2])
sample.Reject.cddm(n,par,max.RT,plot=FALSE)
end_time <- Sys.time()
times.Reject[m,i] <- round(end_time-start_time, 4)
start_time <- Sys.time()
sample.Metropolis.cddm(n,par,max.RT=NA,plot=FALSE)
end_time <- Sys.time()
times.Metro[m,i] <- round(end_time-start_time,4)
i <- i+1
seed <- seed+1
}
}
return(list("times.RW" = times.RW,
"times.Reject" = times.Reject,
"times.Metro" = times.Metro))
}Let’s briefly explore the results we get when we try a second set of parameter values where we have a much larger response boundary (and a rather slower random walk process).
# Arbitrary set of parameter values
par2 <- list("drift" = 1,
"theta" = pi,
"tzero" = 0.1,
"boundary" = 7)
n <- 5000 # No. samples## sumDiff KS-statistic SSDiff
## [1,] -11.9736 0.0179 0.1219
## sumDiff KS-statistic SSDiff
## [1,] -4.9641 0.0203 0.1044
## sumDiff KS-statistic SSDiff
## [1,] 108.5622 0.1513 16.7825
## sumDiff KS-statistic SSDiff
## [1,] 1481.599 0.994 571.9499